THIS WORK IS YET TO BE REVIEWED

1 Introduction

This paper contains estimates for the reproductive number \(R_{t,m}\) over time \(t\) in various countries \(m\). It also shows the same for provinces within South Africa.. This is done using the methodology as described in [1]. These have been implemented in R using EpiEstim package [2] which is what is used here.

This paper and it’s results should be updated roughly daily and is available online.

2 Updates

As this paper is updated over time this section will summarise significant changes. The code producing this paper is tracked using Git. The Git commit hash for this project at the time of generating this paper was 882600f2cf8d05a50be890458d3213158591870a.

2020-06-12

  • Initial version.

2020-06-13

  • Combine the version for South Africa and other countries.

2020-06-14

  • Add Google Mobility data for comparison and further modelling.

2020-06-15

  • Add more details about methodology.

3 Libraries

The project uses the following libraries.

require(EpiEstim)
require(EnvStats)
require(ggplot2)
require(ggpubr)
require(lubridate)
require(utils)
require(httr)
require(dplyr)
require(tidyr)
require(scales)

4 Countries

All countries available in the data will be analysed but also the provinces of South Africa. This paper produces charts for the following countries only:

  • Brazil
  • Canada
  • Chile
  • India
  • Ireland
  • Italy
  • Peru
  • South Africa
  • Spain
  • United Kingdom
country_codes = c("BR", "CA", "CL", "IN", "IE", "IT", "PE", "ZA", "ES", "UK")

5 Data

5.1 South Africa

5.1.1 Download

Data is downloaded from the Git repository associated with [3].

# Provincial Deaths
GET(url = "https://raw.githubusercontent.com/dsfsi/covid19za/master/data/covid19za_provincial_cumulative_timeline_deaths.csv", write_disk("covid19za_provincial_cumulative_timeline_deaths.csv", 
    overwrite = TRUE))
Response [https://raw.githubusercontent.com/dsfsi/covid19za/master/data/covid19za_provincial_cumulative_timeline_deaths.csv]
  Date: 2020-06-16 09:49
  Status: 200
  Content-Type: text/plain; charset=utf-8
  Size: 7.62 kB
<ON DISK>  C:\Users\lrossou\Desktop\COVID-19\rt_estimates\covid19za_provincial_cumulative_timeline_deaths.csv
# Provincial Cases
GET(url = "https://raw.githubusercontent.com/dsfsi/covid19za/master/data/covid19za_provincial_cumulative_timeline_confirmed.csv", write_disk("covid19za_provincial_cumulative_timeline_confirmed.csv", 
    overwrite = TRUE))
Response [https://raw.githubusercontent.com/dsfsi/covid19za/master/data/covid19za_provincial_cumulative_timeline_confirmed.csv]
  Date: 2020-06-16 09:49
  Status: 200
  Content-Type: text/plain; charset=utf-8
  Size: 9.59 kB
<ON DISK>  C:\Users\lrossou\Desktop\COVID-19\rt_estimates\covid19za_provincial_cumulative_timeline_confirmed.csv

5.1.2 Data preperation

First read in the data from the downloaded comma-separated values text file.

# Read from CSVs above
data_za_cases <- read.csv("covid19za_provincial_cumulative_timeline_confirmed.csv", stringsAsFactors = FALSE)
data_za_deaths <- read.csv("covid19za_provincial_cumulative_timeline_deaths.csv", stringsAsFactors = FALSE)

In the case data file row 21 and 32 contain no provincial details. We estimated it by spreading the national total to the provinces in proportion to a mixture of the prior day and the next day.

data_za_cases[21, c("EC", "FS", "GP", "KZN", "LP", "MP", "NC", "NW", "WC", "UNKNOWN")] <- colSums(data_za_cases[c(20, 22), c("EC", "FS", "GP", 
    "KZN", "LP", "MP", "NC", "NW", "WC", "UNKNOWN")])/sum(data_za_cases[c(20, 22), ]$total) * data_za_cases[21, ]$total
data_za_cases[32, c("EC", "FS", "GP", "KZN", "LP", "MP", "NC", "NW", "WC", "UNKNOWN")] <- colSums(data_za_cases[c(31, 33), c("EC", "FS", "GP", 
    "KZN", "LP", "MP", "NC", "NW", "WC", "UNKNOWN")])/sum(data_za_cases[c(31, 33), ]$total) * data_za_cases[32, ]$total

The following function will be applied to case and death data. In this function the following occurs:

  1. Scale up the per province data for unknown values.
  2. This results in provincial data which are not whole numbers. These are rounded to the nearest whole number.
  3. A SA column is added as the sum of the new per province data.
  4. Data is formatted and disaggregated such that item represents the incremental cases or deaths rather than cumulative figures.
  5. Data is filled with data (albeit with 0 cases or deaths) for all dates in the range.
  6. Any incremental case or death counts that are negative are zeroeised.
  7. New cumulative figures are calculated.
fix_data_za <- function(data_za, start_date = as.Date("2020-03-01"), end_date = as.Date("2020-03-31")) {
    # Scale provinces by scale factor (assume unknown are in proportion)
    data_za[, c("EC", "FS", "GP", "KZN", "LP", "MP", "NC", "NW", "WC")] <- data_za[, c("EC", "FS", "GP", "KZN", "LP", "MP", "NC", "NW", "WC")] * 
        (1 + data_za$UNKNOWN/rowSums(data_za[, c("EC", "FS", "GP", "KZN", "LP", "MP", "NC", "NW", "WC")]))
    
    # Only select columns we need
    data_za <- data_za %>% select("date", "EC", "FS", "GP", "KZN", "LP", "MP", "NC", "NW", "WC")
    data_za$date <- as.Date(data_za$date, "%d-%m-%Y")
    
    # Round data so we have integer cases
    data_za[, c("EC", "FS", "GP", "KZN", "LP", "MP", "NC", "NW", "WC")] <- round(data_za[, c("EC", "FS", "GP", "KZN", "LP", "MP", "NC", "NW", 
        "WC")], 0)
    
    # Calculate a new SA column
    data_za$SA <- rowSums(data_za[, c("EC", "FS", "GP", "KZN", "LP", "MP", "NC", "NW", "WC")])
    
    # 'Melt' the data
    data_za <- pivot_longer(data_za, cols = c("EC", "FS", "GP", "KZN", "LP", "MP", "NC", "NW", "WC", "SA"), names_to = "province", values_to = "count")
    
    # Getting daily data from the cumulative data set
    data_za = data_za %>% group_by(province) %>% arrange(date) %>% mutate(count = count - lag(count, default = 0)) %>% ungroup()
    
    # add missing dates
    all_dates <- expand_grid(date = seq(start_date, end_date, 1), province = levels(as.factor(data_za$province)))
    
    # join
    data_za <- left_join(all_dates, data_za, by = c("date", "province"))
    
    # province factor
    data_za$province <- as.factor(data_za$province)
    
    # 0 for NAs
    data_za$count <- ifelse(is.na(data_za$count), 0, data_za$count)
    
    # remove negatives
    data_za$count <- ifelse(data_za$count < 0, 0, data_za$count)
    
    # get cumulative counts
    data_za <- data_za %>% group_by(province) %>% mutate(cumulative_count = cumsum(count)) %>% ungroup()
    
    return(data_za)
}

Below we use the function above to process deaths and cases and then combine them into a single dataset.

start_date <- min(as.Date(data_za_cases$date, "%d-%m-%Y"))
end_date <- max(as.Date(data_za_cases$date, "%d-%m-%Y"))
data_za_cases <- fix_data_za(data_za_cases, start_date = start_date, end_date = end_date)
data_za_deaths <- fix_data_za(data_za_deaths, start_date = start_date, end_date = end_date)

data_za_cases <- cbind("cases", data_za_cases)
data_za_deaths <- cbind("deaths", data_za_deaths)
colnames(data_za_cases)[1] <- "type"
colnames(data_za_deaths)[1] <- "type"

# combined
data_za <- rbind(data_za_cases, data_za_deaths)

# remove data sets no longer needed
rm("data_za_cases", "data_za_deaths", "start_date", "end_date", "fix_data_za")

5.2 Other countries

5.2.1 Download

Data for other countries are downloaded from [4].

GET(url = "https://opendata.ecdc.europa.eu/covid19/casedistribution/csv", write_disk("ecdc_casedistribution.csv", overwrite = TRUE))
Response [https://opendata.ecdc.europa.eu/covid19/casedistribution/csv/]
  Date: 2020-06-16 09:49
  Status: 200
  Content-Type: application/octet-stream
  Size: 1.39 MB
<ON DISK>  C:\Users\lrossou\Desktop\COVID-19\rt_estimates\ecdc_casedistribution.csv

5.2.2 Data preperation

First read in the data from the downloaded comma-separated values text file.

# Read from CSVs above
data <- read.csv("ecdc_casedistribution.csv", stringsAsFactors = FALSE)

Update data by doing the following:

  1. Format dates.
  2. Set column names.
  3. Add data where dates were skipped (with 0 cases or deaths)
  4. “Melt” the data down.
  5. Set fields as factors as needed.
  6. Get cumulative numbers.
# get date
data$date <- as.Date(data$dateRep, format = "%d/%m/%Y")

# get cases and deaths
data <- data %>% select(continentExp, date, geoId, countriesAndTerritories, cases, deaths)

# column names
colnames(data) <- c("continent", "date", "country_code", "country", "cases", "deaths")

# add 0 data for dates that were skipped
for (c in levels(as.factor(data$country))) {
    c_dates <- (data %>% filter(country == c))$date
    check_dates <- seq(min(c_dates), max(c_dates), by = 1)
    missing_dates <- check_dates[!check_dates %in% c_dates]
    continent <- (data %>% filter(country == c))$continent[1]
    country_code <- (data %>% filter(country == c))$country_code[1]
    if (length(missing_dates) > 0) {
        missing_data <- data.frame(continent = continent, date = missing_dates, country_code = country_code, country = rep(c, length(missing_dates)), 
            cases = rep(0, length(missing_dates)), deaths = rep(0, length(missing_dates)))
        data <- rbind(data, missing_data)
    }
}

# 'Melt' the data
data <- pivot_longer(data, cols = c("cases", "deaths"), names_to = "type", values_to = "count")

# fields as factors
data$continent <- as.factor(data$continent)
data$country_code <- as.factor(data$country_code)
data$country <- as.factor(data$country)
data$type <- as.factor(data$type)

# make sure it is a data frame
data <- as.data.frame(data)

# zeroise counts below 0
data$count <- ifelse(data$count < 0, 0, data$count)

# order and get group by
data <- data %>% group_by(country_code, country, type) %>% arrange(country_code, country, type, date) %>% mutate(cumulative_count = cumsum(count)) %>% 
    ungroup()

5.3 Google Mobility Data

Google released mobility data indexes that deviations from a baseline of movement during 3 January to 6 February 2020 [5]. This data is downloaded and prepared for linking to our estimates of \(R_{t,m}\).

5.3.1 Download

The data is made available as a comma-separated file which is updated regularly but not daily. Data is sometimes up to a week behind.

GET(url = "https://www.gstatic.com/covid19/mobility/Global_Mobility_Report.csv", write_disk("google_mobility_data.csv", overwrite = TRUE))
Response [https://www.gstatic.com/covid19/mobility/Global_Mobility_Report.csv]
  Date: 2020-06-16 09:50
  Status: 200
  Content-Type: text/csv
  Size: 35.6 MB
<ON DISK>  C:\Users\lrossou\Desktop\COVID-19\rt_estimates\google_mobility_data.csv

5.3.2 Preparing the mobility data

First read in the file:

# Read from CSVs above
google_mobility_data <- read.csv("google_mobility_data.csv", stringsAsFactors = FALSE, na.strings = "")

Below the data is prepared by doing the following:

  1. Fixing dates
  2. Retaining only country level mobility data, except for South Africa where country and provincial data is retained.
  3. Renaming the fields consistently.
  4. Various transformations to make joining to other data easier later.
# fix dates
google_mobility_data$date <- as.Date(google_mobility_data$date, format = "%Y-%m-%d")

# keep only country level data but keep all data for ZA
google_mobility_data <- google_mobility_data %>% filter(country_region_code == "ZA" | is.na(sub_region_1))

# pick only the fields required
google_mobility_data <- google_mobility_data %>% select(country_region_code, country_region, sub_region_1, iso_3166_2_code, date, retail_and_recreation_percent_change_from_baseline, 
    grocery_and_pharmacy_percent_change_from_baseline, parks_percent_change_from_baseline, transit_stations_percent_change_from_baseline, workplaces_percent_change_from_baseline, 
    residential_percent_change_from_baseline)

# rename the fields
colnames(google_mobility_data) <- c("country_code", "country", "province", "province_code", "date", "retail_and_recreation", "grocery_and_pharmacy", 
    "parks", "transit_stations", "workplaces", "residential")

# move province data to country for ease later
google_mobility_data$country_code[!is.na(google_mobility_data$province_code)] <- google_mobility_data$province_code[!is.na(google_mobility_data$province_code)]
google_mobility_data$country[!is.na(google_mobility_data$province_code)] <- google_mobility_data$province[!is.na(google_mobility_data$province_code)]

# rename KZN consistently
google_mobility_data$country_code[google_mobility_data$country_code == "ZA-NL"] <- "ZA-KZN"

# drop province columns
google_mobility_data <- google_mobility_data[, c(-3, -4)]

# divide indexes by 100 to have numbers between 0 and 1
google_mobility_data[, 4:9] <- google_mobility_data[, 4:9]/100

# fix GB code to UK
google_mobility_data$country_code[google_mobility_data$country_code == "GB"] <- "UK"

# make factors
google_mobility_data$country_code <- as.factor(google_mobility_data$country_code)
google_mobility_data$country <- as.factor(google_mobility_data$country)

Further to the above we also calculate an average mobility index per [6]. This index averages the mobility indexes but exclude the following indexes:

  • Residential index is excluded as it’s less likely to drive the epidemic (and is inversely correlated),
  • Mobility in parks is also unlikely to be a driving factor.
google_mobility_data$average_mobility <- (google_mobility_data$retail_and_recreation + google_mobility_data$grocery_and_pharmacy + google_mobility_data$transit_stations + 
    google_mobility_data$workplaces)/4

5.4 Combine South African data with other data

To simplify further analysis it’s easier to combine all data in a single dataset:

# Add a 'continent' field to data_za and combine
data_za <- cbind("SA", data_za)
colnames(data_za)[1] <- "continent"
colnames(data_za)[4] <- "country"
data_za$country_code <- paste0("ZA-", data_za$country)
data <- rbind(data, data_za)

# remove all objects except what we need
rm(list = ls()[!ls() %in% c("data", "country_codes", "google_mobility_data")])

6 Basic Exploration

6.1 South Africa

Below we plot cumulative case count on a log scale by province:

ggplot(data %>% filter(continent == "SA" & type == "cases" & cumulative_count > 0), aes(x = date, y = cumulative_count)) + geom_line(aes(color = country), 
    size = 1) + scale_y_log10(labels = comma) + ggtitle("Cumulative Cases by Province") + xlab("Date") + ylab("Cumulative Cases") + theme_bw() + 
    theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "right") + guides(fill = guide_legend(ncol = 1)) + scale_color_hue(l = 50)

Below we plot the cumulative deaths by province on a log scale:

ggplot(data %>% filter(continent == "SA" & type == "deaths" & cumulative_count > 0), aes(x = date, y = cumulative_count)) + geom_line(aes(color = country), 
    size = 1) + scale_y_log10(labels = comma) + ggtitle("Cumulative Deaths by Province") + xlab("Date") + ylab("Cumulative Deaths") + theme_bw() + 
    theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "right") + guides(fill = guide_legend(ncol = 1)) + scale_color_hue(l = 50)

Below we plot average mobility indexes by province:

ggplot(data = google_mobility_data[substr(as.character(google_mobility_data$country_code), 1, 2) == "ZA", ], aes(x = date, y = average_mobility, 
    colour = country)) + geom_line() + scale_color_hue(l = 50) + theme_bw()

6.2 Other Countries

Below we plot cumulative case count on a log scale by country:

ggplot(data %>% filter(continent != "SA" & type == "cases" & country_code %in% country_codes & cumulative_count > 0), aes(x = date, y = cumulative_count)) + 
    geom_line(aes(color = country), size = 1) + scale_y_log10(labels = comma) + ggtitle("Cumulative Cases by Country") + xlab("Date") + ylab("Cumulative Cases") + 
    theme_bw() + theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "right") + guides(fill = guide_legend(ncol = 1)) + 
    scale_color_hue(l = 50)

Below we plot the cumulative deaths by country on a log scale:

ggplot(data %>% filter(continent != "SA" & type == "deaths" & country_code %in% country_codes & cumulative_count > 0), aes(x = date, y = cumulative_count)) + 
    geom_line(aes(color = country), size = 1) + scale_y_log10(labels = comma) + ggtitle("Cumulative Deaths by Country") + xlab("Date") + ylab("Cumulative Deaths") + 
    theme_bw() + theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "right") + guides(fill = guide_legend(ncol = 1)) + 
    scale_color_hue(l = 50)

Below we plot average mobility indexes by country:

ggplot(data = google_mobility_data %>% filter(country_code %in% country_codes), aes(x = date, y = average_mobility, colour = country)) + geom_line() + 
    scale_color_hue(l = 50) + theme_bw()

7 Method & Assumptions

This analysis follows the follows the method proposed [1].

Essentially this models the following relationship:

\(E(I_{t,m})=R_{t,m}\sum_{s=1}^tI_{t-s,m}w_{s}\) where:

  • \(m\) is the country of province in case of South African data.
  • \(I_{t,m}\) is the number of cases (or deaths) reported at time \(t\).
  • \(R_{t,m}\) is the “instantaneous reproduction number” to quote [1].
  • \(I_{t,m} \sim Poisson\) with the above mean.
  • \(w_{s}\) is the infectivity profile of an individual over time \(s\).

The formulation of instantaneous rate of transmission proposed in [1] is convenient because it ensures that \(R_{t,m}\) only depends on information that is known at time \(t\). We do not need to know about future transmission.

7.1 Serial Interval

To do further analysis an serial interval assumption is needed (\(w_{s}\)). The serial interval is taken from [7]. It’s assumed to be Gamma distributed with mean of 6.48 and standard deviation of 3.83. This corresponds to the effective infectiousness of an individual since acquiring the infection themselves.

We plot this serial distribution below:

si_mean <- 6.48
si_sd <- 3.83
si_cv <- si_sd/si_mean

serial_interval = rep(0, 1)
serial_interval[1] = (pgammaAlt(1.5, si_mean, si_cv) - pgammaAlt(0, si_mean, si_cv))
for (i in 2:50) {
    serial_interval[i] = (pgammaAlt(i + 0.5, si_mean, si_cv) - pgammaAlt(i - 0.5, si_mean, si_cv))
}
ggplot(data.frame(days = seq(1, 20, 1), serial_interval = serial_interval[1:20]), aes(y = serial_interval, x = days)) + geom_line(size = 1) + 
    theme_bw() + theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "right") + guides(fill = guide_legend(ncol = 1)) + 
    scale_color_hue(l = 50)

7.2 Time Window Size

# sliding window
tau <- 7

[1] describes a choice of a time window to estimate \(R_{t,m}\). Based on this analysis we chose a time window of 7-days (\(\tau\) in using notation of [1]). During this window \(R_{t,m}\) is assumed to be constant. However information prior that is taken into account.

Note that the time window size does not interact with the serial interval as information prior to the window is taken into account. It’s just the period during which \(R_{t,m}\) is assumed to be constant.

Based on the table in Appendix 2 of [1] we believe a window of 7-days to be reasonable as long as that window contains 12 or more cases or deaths.

8 Estimate \(R_{t,m}\)

8.1 Estimation Routine

The following code works backwards and fits \(R_{t,m}\) values for whole weeks (ending on the last date in the data) using the EpiEstim package. Using deaths or infection the as the cases in the puts the estimate of \(R_{t,m}\) as at the date of the cases being tracked.

So \(t\) is based on the date of reporting (be that cases or deaths) and \(m\) is the country or province. Two values are estimated for each country:

  1. \(R_{t,m}^{cases}\) is the reproductive number implied by the cases reported at time \(t\) in country \(m\).
  2. \(R_{t,m}^{deaths}\) is the reproductive number implied by the deaths reported at time \(t\) in country \(m\).

Note that the time periods are left unadjusted, though in reality the \(R_{t,m}^{deaths}\) should be shifted back approximately 2 weeks relative to \(R_{t,m}^{cases}\).

Rt_data <- NULL
for (c in levels(data$country)) {
    for (t in levels(data$type)) {
        # filter out country data of type t
        c_data <- data %>% filter(country == c & type == t)
        
        # vector of count of cases/deaths
        I <- c_data$count
        
        # t=1 corresponds to this date:
        t1_date <- min(c_data$date)
        
        # only continue if c_data$cumulative_count > 12
        if (max(c_data$cumulative_count) >= 12) {
            # the day after the first case/death:
            t_start_date <- min((c_data %>% filter(cumulative_count >= 12))$date) + 1
            t_start <- min(seq(1, length(I))[c_data$cumulative_count >= 12]) + 1
            
            # last day of cases/deaths
            t_end <- length(I)
            t_end_date <- t1_date + (t_end - 1)
            
            # how many full weeks do we have
            full_weeks <- floor((t_end - t_start)/tau)
            
            # only continue if we have 1 full week or more
            if (full_weeks > 0) {
                # then divide period into weeks
                T.Start <- seq(t_end - tau * full_weeks, t_end - tau, by = tau)
                T.End <- T.Start + tau
                
                # estimate $R_{t,m}^{type}$ for each week:
                c_Rt <- estimate_R(incid = I, config = make_config(t_start = T.Start, t_end = T.End, mean_si = si_mean, std_si = si_sd), method = "parametric_si")$R
                
                # count cases
                c_data$week <- floor(as.numeric(c_data$date - (t1_date + T.Start[1]))/tau)
                c_data$dd <- c(0, c_data$date[2:nrow(c_data)] - c_data$date[1:(nrow(c_data) - 1)])
                c_agg_data <- c_data %>% filter(week >= 0) %>% group_by(week) %>% summarise(count = sum(count), .groups = "drop")
                
                # add country and type designations
                c_Rt <- cbind(c_data$continent[1], c_data$country_code[1], c, t, c_Rt, c_agg_data$count)
                
                # and add dates
                c_Rt$date_start <- t1_date + c_Rt$t_start
                c_Rt$date_end <- t1_date + c_Rt$t_end - 1
                
                # combine the results
                if (is.null(Rt_data)) {
                  Rt_data <- c_Rt
                } else {
                  Rt_data <- rbind(Rt_data, c_Rt)
                }
            }
        }
    }
}

8.2 Prepping data for tabulation and plots

Below we prep column headings and prepare CI data:

# Column names
colnames(Rt_data) <- c("continent", "country_code", "country", "type", "t_start", "t_end", "Rt_mean", "Rt_std", "Rt_li_95", "Rt_li_90", "Rt_li_50", 
    "Rt_median", "Rt_ui_50", "Rt_ui_90", "Rt_ui_95", "count", "date_start", "date_end")

# mid week data point
Rt_data$date_mid <- Rt_data$date_start + (Rt_data$date_end - Rt_data$date_start)/2

# split out CI data into different dataset
Rt_ci_data <- NULL
for (ci in c("50", "90", "95")) {
    r <- data.frame(country_code = Rt_data$country_code, country = Rt_data$country, type = Rt_data$type, ci = rep(paste0(ci, "%"), nrow(Rt_data)), 
        date_mid = Rt_data$date_mid, Rt_mean = Rt_data$Rt_mean, Rt_li = Rt_data[[paste0("Rt_li_", ci)]], Rt_ui = Rt_data[[paste0("Rt_ui_", 
            ci)]])
    Rt_ci_data <- rbind(r, Rt_ci_data)
}

# Factors
Rt_data$type <- as.factor(Rt_data$type)
Rt_data$country <- as.factor(Rt_data$country)

# remove what is not needed
rm(list = ls()[!ls() %in% c("data", "country_codes", "Rt_data", "Rt_ci_data", "google_mobility_data", "tau")])

9 Linking \(R_{t,m}\) estimates to mobility

9.1 Preparing Rt_data

Linking the mobility data should be done in a way that corresponds to the time the infection should have occurred not the time of death. Below new date ranges are calculated to correspond roughly to the time of infection:

Rt_data$date_inf_start <- Rt_data$date_start - ifelse(Rt_data$type == "death", 23 + 7, 6 + 7)
Rt_data$date_inf_end <- Rt_data$date_end - ifelse(Rt_data$type == "death", 23 + 7, 6 + 7)
Rt_data$date_inf_mid <- Rt_data$date_mid - ifelse(Rt_data$type == "death", 23 + 7, 6 + 7)

9.2 Calculating weekly mobility data

The code below calculates average weekly mobility data averaged over the same date ranges as calculated above.

weekly_mobility_data <- NULL
combined_country_codes <- levels(Rt_data$country_code)[levels(Rt_data$country_code) %in% levels(google_mobility_data$country_code)]
for (c in combined_country_codes) {
    for (t in levels(Rt_data$type)) {
        c_data <- Rt_data %>% filter(country_code == c & type == t)
        if (nrow(c_data) > 0) {
            for (i in 1:nrow(c_data)) {
                w_data <- google_mobility_data %>% filter(country_code == c & date >= c_data$date_inf_start[i] & date <= c_data$date_inf_end[i]) %>% 
                  group_by(country_code) %>% summarise(retail_and_recreation = mean(retail_and_recreation), grocery_and_pharmacy = mean(grocery_and_pharmacy), 
                  parks = mean(parks), transit_stations = mean(transit_stations), workplaces = mean(workplaces), residential = mean(residential), 
                  average_mobility = mean(average_mobility), .groups = "drop")
                if (nrow(w_data) > 0) {
                  if (!is.na(w_data$country_code)) {
                    w_data$type <- t
                    w_data$date_inf_mid <- c_data$date_inf_mid[i]
                    weekly_mobility_data <- rbind(weekly_mobility_data, w_data)
                  }
                }
            }
        }
    }
}

9.3 Joining with \(R_{t,m}\) estimates

Based on work above the weekly mobility data can be joined:

Rt_data <- left_join(Rt_data, weekly_mobility_data, by = c("country_code", "type", "date_inf_mid"))

10 Modelling \(R_{t,m}\) as a function of Mobility Indexes

TO DO

11 Results

11.1 Current \(R_{t,m}\) estimates

Below current (last weekly) \(R_{t,m}\) estimates are tabulated. To so the data first needs to be prepared:

# find the last estimates
last_dates <- Rt_data %>% group_by(country, type) %>% summarise(date_mid = max(date_mid), .groups = "drop")

# construct a table with nice fields names
table <- inner_join(last_dates, Rt_data, by = c("country", "type", "date_mid"))
table <- table %>% select(continent, country_code, country, type, count, date_end, Rt_li_95, Rt_mean, Rt_ui_95)

11.1.1 South Africa

knitr::kable((table %>% filter(continent == "SA"))[, c(-1, -2)], format.args = list(big.mark = ","), digits = 1, col.names = c("Country", "Estimated Type", 
    "Count (Last Week)", "Week Ending", "R - Lower CI", "R - Mean", "R - Uppper CI"))
Country Estimated Type Count (Last Week) Week Ending R - Lower CI R - Mean R - Uppper CI
EC cases 4,260 2020-06-15 1.6 1.7 1.7
EC deaths 109 2020-06-15 1.8 2.2 2.6
FS cases 139 2020-06-15 1.2 1.5 1.7
GP cases 5,940 2020-06-15 2.2 2.2 2.3
GP deaths 40 2020-06-15 1.5 2.0 2.7
KZN cases 785 2020-06-15 1.1 1.2 1.3
KZN deaths 7 2020-06-15 0.5 1.0 1.7
LP cases 118 2020-06-15 1.7 2.0 2.3
MP cases 139 2020-06-15 1.5 1.8 2.1
NC cases 87 2020-06-15 1.7 2.1 2.6
NW cases 598 2020-06-15 1.6 1.7 1.8
SA cases 22,653 2020-06-15 1.3 1.4 1.4
SA deaths 488 2020-06-15 1.2 1.4 1.5
WC cases 10,587 2020-06-15 1.0 1.1 1.1
WC deaths 327 2020-06-15 1.1 1.2 1.3

11.1.2 Other Countries

The table below contains \(R_{t,m}\) estimates for the last week available. These estimates are based either on cases or deaths and a 95% confidence interval is shown.

knitr::kable((table %>% filter(continent != "SA" & country_code %in% country_codes))[, c(-1, -2)], format.args = list(big.mark = ","), digits = 1, 
    col.names = c("Country", "Estimated Type", "Count (Last Week)", "Week Ending", "R - Lower CI", "R - Mean", "R - Uppper CI"))
Country Estimated Type Count (Last Week) Week Ending R - Lower CI R - Mean R - Uppper CI
Brazil cases 180,859 2020-06-16 1.0 1.0 1.0
Brazil deaths 6,825 2020-06-16 0.9 0.9 1.0
Canada cases 2,903 2020-06-16 0.7 0.7 0.7
Canada deaths 340 2020-06-16 0.6 0.6 0.7
Chile cases 40,590 2020-06-16 1.1 1.1 1.2
Chile deaths 1,098 2020-06-16 1.0 1.0 1.1
India cases 76,493 2020-06-16 1.1 1.1 1.1
India deaths 2,434 2020-06-16 1.2 1.3 1.3
Ireland cases 114 2020-06-16 0.5 0.6 0.7
Ireland deaths 23 2020-06-16 0.5 0.7 1.0
Italy cases 2,012 2020-06-16 0.9 0.9 1.0
Italy deaths 407 2020-06-16 0.7 0.8 0.9
Peru cases 33,296 2020-06-16 0.9 0.9 0.9
Peru deaths 1,289 2020-06-16 1.1 1.2 1.3
South_Africa cases 22,654 2020-06-16 1.3 1.4 1.4
South_Africa deaths 488 2020-06-16 1.2 1.3 1.5
Spain cases 2,392 2020-06-15 0.9 0.9 1.0
Spain deaths 0 2020-06-15 0.0 0.0 0.1
United_Kingdom cases 9,458 2020-06-16 0.8 0.8 0.8
United_Kingdom deaths 1,139 2020-06-16 0.6 0.6 0.7

11.1.3 Top 10 Countries

Below we show various extremes of \(R_{t,m}\) where counts (deaths or cases) exceed 50 in the last week.

11.1.3.1 Lowest \(R_{t,m}\) based on Deaths

knitr::kable((table %>% filter(continent != "SA" & type == "deaths" & count > 50) %>% arrange(Rt_mean))[1:10, c(-1, -2)], format.args = list(big.mark = ","), 
    digits = 1, col.names = c("Country", "Estimated Type", "Count (Last Week)", "Week Ending", "R - Lower CI", "R - Mean", "R - Uppper CI"))
Country Estimated Type Count (Last Week) Week Ending R - Lower CI R - Mean R - Uppper CI
Belgium deaths 52 2020-06-16 0.4 0.5 0.6
Canada deaths 340 2020-06-16 0.6 0.6 0.7
United_Kingdom deaths 1,139 2020-06-16 0.6 0.6 0.7
Germany deaths 89 2020-06-16 0.5 0.7 0.8
France deaths 227 2020-06-16 0.6 0.7 0.8
Sweden deaths 197 2020-06-16 0.7 0.8 0.9
Turkey deaths 114 2020-06-16 0.7 0.8 0.9
Italy deaths 407 2020-06-16 0.7 0.8 0.9
United_States_of_America deaths 5,120 2020-06-16 0.8 0.8 0.8
Brazil deaths 6,825 2020-06-16 0.9 0.9 1.0

11.1.3.2 Lowest \(R_{t,m}\) based on Cases

knitr::kable((table %>% filter(continent != "SA" & type == "cases" & count > 50) %>% arrange(Rt_mean))[1:10, c(-1, -2)], format.args = list(big.mark = ","), 
    digits = 1, col.names = c("Country", "Estimated Type", "Count (Last Week)", "Week Ending", "R - Lower CI", "R - Mean", "R - Uppper CI"))
Country Estimated Type Count (Last Week) Week Ending R - Lower CI R - Mean R - Uppper CI
Malaysia cases 165 2020-06-16 0.3 0.4 0.4
Djibouti cases 223 2020-06-16 0.3 0.4 0.4
Uganda cases 59 2020-06-16 0.4 0.5 0.6
Cuba cases 62 2020-06-16 0.4 0.6 0.7
Ireland cases 114 2020-06-16 0.5 0.6 0.7
Canada cases 2,903 2020-06-16 0.7 0.7 0.7
Tajikistan cases 488 2020-06-16 0.7 0.7 0.8
Nicaragua cases 346 2020-06-16 0.7 0.7 0.8
Finland cases 107 2020-06-16 0.6 0.7 0.9
Puerto_Rico cases 844 2020-06-16 0.7 0.8 0.8

11.1.3.3 Highest \(R_{t,m}\) based on Deaths

knitr::kable((table %>% filter(continent != "SA" & type == "deaths" & count > 50) %>% arrange(-Rt_mean))[1:10, c(-1, -2)], format.args = list(big.mark = ","), 
    digits = 1, col.names = c("Country", "Estimated Type", "Count (Last Week)", "Week Ending", "R - Lower CI", "R - Mean", "R - Uppper CI"))
Country Estimated Type Count (Last Week) Week Ending R - Lower CI R - Mean R - Uppper CI
Cameroon deaths 68 2020-06-16 1.9 2.5 3.1
Yemen deaths 96 2020-06-16 1.9 2.3 2.8
Iraq deaths 282 2020-06-16 1.6 1.8 2.0
Dominican_Republic deaths 66 2020-06-16 1.2 1.6 1.9
Egypt deaths 401 2020-06-16 1.3 1.5 1.6
Philippines deaths 87 2020-06-16 1.2 1.4 1.7
South_Africa deaths 488 2020-06-16 1.2 1.3 1.5
Romania deaths 93 2020-06-16 1.1 1.3 1.6
India deaths 2,434 2020-06-16 1.2 1.3 1.3
Colombia deaths 418 2020-06-16 1.1 1.2 1.3

11.1.3.4 Highest \(R_{t,m}\) based on Cases

knitr::kable((table %>% filter(continent != "SA" & type == "cases" & count > 50) %>% arrange(-Rt_mean))[1:10, c(-1, -2)], format.args = list(big.mark = ","), 
    digits = 1, col.names = c("Country", "Estimated Type", "Count (Last Week)", "Week Ending", "R - Lower CI", "R - Mean", "R - Uppper CI"))
Country Estimated Type Count (Last Week) Week Ending R - Lower CI R - Mean R - Uppper CI
Eritrea cases 68 2020-06-16 12.1 15.6 19.4
China cases 184 2020-06-16 2.6 3.1 3.5
Benin cases 178 2020-06-16 2.4 2.8 3.2
Kosovo cases 223 2020-06-16 1.8 2.0 2.2
Yemen cases 348 2020-06-16 1.8 2.0 2.2
Rwanda cases 161 2020-06-16 1.7 2.0 2.3
Albania cases 327 2020-06-16 1.8 2.0 2.2
Congo cases 200 2020-06-16 1.6 1.9 2.1
Mauritania cases 940 2020-06-16 1.7 1.8 1.9
Eswatini cases 166 2020-06-16 1.5 1.7 2.0

11.2 Result Plots

Let’s generate our plots for each country/province in a list. We filter out weeks where the upper end of confidence interval for \(R_{t,m}\) exceeds five.

country_plots <- list()
for (c in levels(Rt_data$country)) {
    p1 <- ggplot(Rt_ci_data %>% filter(country == c & ci == "95%" & Rt_ui <= 5), aes(x = date_mid, y = Rt_mean)) + geom_crossbar(position = position_dodge2(padding = 0), 
        aes(ymin = Rt_li, ymax = Rt_ui, colour = type, fill = type), width = tau) + scale_fill_manual(values = c(alpha("deepskyblue4", 0.45), 
        alpha("#8b0068", 0.45))) + scale_colour_manual(values = c(alpha("deepskyblue4"), alpha("#8b0068"))) + scale_y_continuous(limits = c(0, 
        5)) + scale_x_date(date_breaks = "1 months", labels = date_format("%e %b")) + theme_bw() + theme(axis.text.x = element_text(angle = 45, 
        hjust = 1), legend.position = "right") + guides(fill = guide_legend(ncol = 1)) + xlab("Week") + ylab(expression(R[t, m])) + ggtitle(c) + 
        geom_hline(yintercept = 1)
    country_plots[[c]] <- p1
}

11.2.1 South Africa

Below we plot estimates for all the provinces. Note that [1] recommends only to start using estimates after at least 12 cases have occurred. On that basis we do not have estimates for R based on deaths for a number of provinces.

ggarrange(country_plots[["EC"]], country_plots[["FS"]], country_plots[["GP"]], country_plots[["KZN"]], country_plots[["LP"]], country_plots[["MP"]], 
    country_plots[["NC"]], country_plots[["NW"]], country_plots[["WC"]], country_plots[["SA"]], ncol = 2, nrow = 5) + ggtitle("Reproductive number by province, with 95% confidence intervals")

11.2.2 Other Countries

Below we plot all the countries:

ggarrange(country_plots[["Brazil"]], country_plots[["Canada"]], country_plots[["Chile"]], country_plots[["India"]], country_plots[["Ireland"]], 
    country_plots[["Italy"]], country_plots[["Peru"]], country_plots[["South_Africa"]], country_plots[["Spain"]], country_plots[["United_Kingdom"]], 
    ncol = 2, nrow = 5) + ggtitle("Reproductive number by country, with 95% confidence intervals")

11.2.3 Scatter Plots of \(R_{t,m}^{cases}\) vs. \(R_{t,m}^{deaths}\)

Below we plot estimates of \(R_{t,m}\) based on cases versus those based on deaths for countries where these estimates range between 0.5 and 2.

11.2.3.1 South Africa

Below we plot where there were more than 5 deaths and cases in the last week. It appears that, generally, \(R_{t,m}^{cases}\) and \(R_{t,m}^{deaths}\) are consistent.

sp_data <- pivot_wider(table %>% filter(count >= 5 & continent == "SA"), id_cols = c("continent", "country"), names_from = "type", values_from = "Rt_mean")

ggplot(sp_data, aes(x = cases, y = deaths)) + geom_point(aes(label = country)) + geom_text(aes(label = country), hjust = 0, vjust = 0, size = 2) + 
    ggtitle("Reproductive number by country") + xlab(expression(R[t, m]^cases)) + ylab(expression(R[t, m]^deaths)) + scale_y_continuous(limits = c(0.5, 
    3)) + scale_x_continuous(limits = c(0.5, 3)) + theme_bw() + theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "right") + 
    guides(fill = guide_legend(ncol = 1)) + scale_color_hue(l = 50) + geom_abline(a = 0, b = 1)

11.2.3.2 Other countries

Below we plot where there were more than 25 deaths and cases in the last week. It appears that, generally, \(R_{t,m}^{cases}\) and \(R_{t,m}^{deaths}\) are consistent.

sp_data <- pivot_wider(table %>% filter(count >= 25 & continent != "SA"), id_cols = c("continent", "country"), names_from = "type", values_from = "Rt_mean")

ggplot(sp_data, aes(x = cases, y = deaths)) + geom_point(aes(color = continent, label = country)) + geom_text(aes(label = country), hjust = 0, 
    vjust = 0, size = 2) + ggtitle("Reproductive number by country") + xlab(expression(R[t, m]^cases)) + ylab(expression(R[t, m]^deaths)) + 
    scale_y_continuous(limits = c(0.5, 2)) + scale_x_continuous(limits = c(0.5, 2)) + theme_bw() + theme(axis.text.x = element_text(angle = 45, 
    hjust = 1), legend.position = "right") + guides(fill = guide_legend(ncol = 1)) + scale_color_hue(l = 50) + geom_abline(a = 0, b = 1)

11.3 Mobility

11.3.1 Simple plot

Below we plot weeks of estimates of \(R_{t,m}\) versus the average mobility index that was constructed. There is a clear link between mobility and the reproductive number. As it decreases \(R_{t,m}\) reduces:

ggplot(Rt_data %>% filter(type == "cases" & count > 100 & country_code %in% c(country_codes)), aes(x = average_mobility, y = Rt_mean)) + geom_point(aes(colour = country)) + 
    ggtitle("Reproductive number by mobility") + ylab(expression(R[t, m]^cases)) + # scale_y_continuous(limits = c(0, 10)) +
theme_bw() + theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "right") + guides(fill = guide_legend(ncol = 1)) + 
    scale_color_hue(l = 50)

11.3.2 Modelling Results

TO DO

12 Discussion

From the basic plots it’s clear that South Africa and a few other countries appear to be on a different “slope” than European countries show. These include Brazil, Peru, Chile and India.

The above shows estimates for reproductive number using cases and deaths (\(R_{t,m}^{cases}\) and \(R_{t,m}^{deaths}\)) for each country \(m\) over time \(t\) in weeks.

From the current estimates it appears that, at present, in general South African provincial estimates based on cases and deaths correspond. An exception to this may be Eastern Cape with estimates based on deaths exceeding those based on cases.

It is also clear that estimates based on cases appear to precede the movements based on deaths over time. This can be expected as, all things being equal, infections and associated cases should be a precursor to deaths.

South Africa does not compare well, and appears to have one of the highest \(R_{t,m}^{deaths}\) in the world. This is higher than the \(R_{t,m}^{cases}\) which may be indicative of lower testing and/or backlogs with regard to testing.

On a scatter plot of countries most appear to have \(R_{t,m}^{cases}\) correlated well with \(R_{t,m}^{deaths}\). Chile and Afghanistan appear to be outliers with \(R_{t,m}^{deaths}\) higher than \(R_{t,m}^{cases}\) indicating perhaps limited testing, or alternatively epidemics that are slowing (given the leading nature of cases vs. deaths).

Overall it’s clear that having multiple measures of \(R_{t,m}\) appears useful and monitoring it’s value is something useful.

We also show that mobility is clearly correlated with \(R_{t,m}^{cases}\) in the countries shown.

Limitation of this method to estimate \(R_{t,m}\) are noted in [1]

  • It’s sensitive to changes in transmissibility, changes in contact patterns, depletion of the susceptible population and control measures.
  • It relies on an assumed serial interval assumptions.
  • The size of the time window can affect the volatility of results.
  • Results are time lagged with regards to true infection, more so in the case of the use of deaths.
  • It’s sensitive to changes in case (or death) detection.
  • The serial interval may change over time.

Despite these limitation we believe the ease of calculation of this method and the ability to use multiple sources makes it useful as a monitoring tool.

13 Detailed Output

Detailed output for all countries are saved to a comma-separated value file below. The file can be found here.

write.csv(Rt_data, "Rt_data.csv", row.names = FALSE)

14 Author

This report was prepared by Louis Rossouw. Please get in contact with Louis Rossouw if you have comments or wish to receive this regularly.

Louis Rossouw
Head of Research & Analytics
Gen Re | Life/Health Canada, South Africa, Australia, NZ, UK & Ireland
Email: LRossouw@GenRe.com Mobile: +27 71 355 2550

The views in this document represents that of the author and may not represent those of Gen Re. Also note that given the significant uncertainty involved with the parameters, data and methodology care should be taken with these numbers and any use of these numbers.

15 References

[1] A. Cori, N. M. Ferguson, C. Fraser, and S. Cauchemez, “A new framework and software to estimate time-varying reproduction numbers during epidemics,” American Journal of Epidemiology, vol. 178, no. 9, pp. 1505–1512, Sep. 2013, doi: 10.1093/aje/kwt133. [Online]. Available: https://doi.org/10.1093/aje/kwt133

[2] A. Cori, EpiEstim: EpiEstim: A package to estimate time varying reproduction numbers from epidemic curves. 2013 [Online]. Available: https://CRAN.R-project.org/package=EpiEstim

[3] V. Marivate et al., “Coronavirus disease (COVID-19) case data - South Africa.” Zenodo, 2020 [Online]. Available: https://zenodo.org/record/3888499

[4] European Centre for Disease Prevention and Control, “Data on the geographic distribution of COVID-19 cases worldwide.” European Union, 2020 [Online]. Available: https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-distribution-covid-19-cases-worldwide

[5] Google LLC, “Google COVID-19 community mobility reports.” 2020 [Online]. Available: https://www.google.com/covid19/mobility/

[6] P. Nouvellet et al., “Report 26: Reduction in mobility and COVID-19 transmission,” Imperial College London, 2020 [Online]. Available: https://www.imperial.ac.uk/mrc-global-infectious-disease-analysis/covid-19/report-26-mobility-transmission/

[7] N. Ferguson et al., “Report 9: Impact of non-pharmaceutical interventions (NPIs) to reduce COVID19 mortality and healthcare demand,” Imperial College London, 2020 [Online]. Available: https://www.imperial.ac.uk/mrc-global-infectious-disease-analysis/covid-19/report-9-impact-of-npis-on-covid-19/

---
title: "Estimating the Reproductive Number of COVID-19"
author: "Louis Rossouw"
date: "`r Sys.Date()`"
output: 
  html_notebook:
    toc: true
    toc_float:
      collapsed: false
      smooth_scroll: false
    number_sections: true
    includes:
      in_header: google_analytics_header.html
  html_document:
    toc: true
    toc_float:
      collapsed: false
      smooth_scroll: false
    number_sections: true
    includes:
      in_header: google_analytics_header.html
csl: ieee_with_url.csl
bibliography: references.bib
---

```{r setup, include=FALSE}
rm(list=ls())
knitr::opts_chunk$set(echo=TRUE, warning = FALSE, tidy=TRUE, dpi=300)
```

***THIS WORK IS YET TO BE REVIEWED***

# Introduction

This paper contains estimates for the reproductive number $R_{t,m}$ over time $t$ in various countries $m$.  It also shows the same for provinces within South Africa..  This is done using the methodology as described in [@Cori2013].  These have been implemented in R using `EpiEstim` package [@Cori2013a] which is what is used here.  

This paper and it's results should be updated roughly daily and [is available online](https://lrossouw.github.io/covid-19/estimating_r.nb.html?utm_source=paper&utm_medium=paper_url&utm_campaign=week2).

# Updates

As this paper is updated over time this section will summarise significant changes.  The code producing this paper is tracked using Git. The Git commit hash for this project at the time of generating this paper was `r system("git rev-parse HEAD", intern=TRUE)`.

**2020-06-12**

* Initial version.

**2020-06-13**

* Combine the version for South Africa and other countries.

**2020-06-14**

* Add Google Mobility data for comparison and further modelling.

**2020-06-15**

* Add more details about methodology.

# Libraries

The project uses the following libraries.

```{r libraries, warning=FALSE, message=FALSE}
require(EpiEstim)
require(EnvStats)
require(ggplot2)
require(ggpubr)
require(lubridate)
require(utils)
require(httr)
require(dplyr)
require(tidyr)
require(scales)
```

# Countries

All countries available in the data will be analysed but also the provinces of South Africa.  This paper produces charts for the following countries only:

* Brazil
* Canada
* Chile
* India
* Ireland
* Italy
* Peru
* South Africa
* Spain
* United Kingdom

```{r countries}
country_codes = c("BR", 
              "CA",
              "CL",
              "IN",
              "IE",
              "IT",
              "PE",
              "ZA",
              "ES",
              "UK"
              )
```

# Data

## South Africa

### Download
Data is downloaded from the Git repository associated with [@Marivate2020].

```{r download_data_za, warning=FALSE, message=FALSE}
# Provincial Deaths
GET(
  url = "https://raw.githubusercontent.com/dsfsi/covid19za/master/data/covid19za_provincial_cumulative_timeline_deaths.csv",
  write_disk(
    "covid19za_provincial_cumulative_timeline_deaths.csv",
    overwrite = TRUE
  )
)

# Provincial Cases
GET(
  url = "https://raw.githubusercontent.com/dsfsi/covid19za/master/data/covid19za_provincial_cumulative_timeline_confirmed.csv",
  write_disk(
    "covid19za_provincial_cumulative_timeline_confirmed.csv",
    overwrite = TRUE
  )
)
```

### Data preperation

First read in the data from the downloaded comma-separated values text file.

```{r read_data_za}
# Read from CSVs above
data_za_cases <-
  read.csv(
    "covid19za_provincial_cumulative_timeline_confirmed.csv",
    stringsAsFactors = FALSE
  )
data_za_deaths <-
  read.csv("covid19za_provincial_cumulative_timeline_deaths.csv",
           stringsAsFactors = FALSE)
```

In the case data file row 21 and 32 contain no provincial details. We estimated it by spreading the national total to the provinces in proportion to a mixture of the prior day and the next day.

```{r fix_za_cases}
data_za_cases[21, c("EC", "FS", "GP", "KZN", "LP", "MP", "NC", "NW", "WC", "UNKNOWN")] <-
  colSums(data_za_cases[c(20, 22), c("EC",
                                "FS",
                                "GP",
                                "KZN",
                                "LP",
                                "MP",
                                "NC",
                                "NW",
                                "WC",
                                "UNKNOWN")]) / sum(data_za_cases[c(20, 22), ]$total) * data_za_cases[21, ]$total
data_za_cases[32, c("EC", "FS", "GP", "KZN", "LP", "MP", "NC", "NW", "WC", "UNKNOWN")] <-
  colSums(data_za_cases[c(31, 33), c("EC",
                                "FS",
                                "GP",
                                "KZN",
                                "LP",
                                "MP",
                                "NC",
                                "NW",
                                "WC",
                                "UNKNOWN")]) / sum(data_za_cases[c(31, 33), ]$total) * data_za_cases[32, ]$total

```

The following function will be applied to case and death data.  In this function the following occurs:

1. Scale up the per province data for unknown values.
2. This results in provincial data which are not whole numbers.  These are rounded to the nearest whole number.
3. A `SA` column is added as the sum of the new per province data.
4. Data is formatted and disaggregated such that item represents the incremental cases or deaths rather than cumulative figures.
5. Data is filled with data (albeit with 0 cases or deaths) for all dates in the range.
6. Any incremental case or death counts that are negative are zeroeised.
6. New cumulative figures are calculated.


```{r fix_data_za}
fix_data_za <-
  function(data_za,
           start_date = as.Date("2020-03-01"),
           end_date = as.Date("2020-03-31")) {
    # Scale provinces by scale factor (assume unknown are in proportion)
    data_za[, c("EC", "FS", "GP", "KZN", "LP", "MP", "NC", "NW", "WC")] <-
      data_za[, c("EC", "FS", "GP", "KZN", "LP", "MP", "NC", "NW", "WC")] *
      (1 + data_za$UNKNOWN /
         rowSums(data_za[, c("EC", "FS", "GP", "KZN", "LP", "MP", "NC", "NW", "WC")]))
    
    # Only select columns we need
    data_za <- data_za %>%
      select("date", "EC", "FS", "GP", "KZN", "LP", "MP", "NC", "NW", "WC")
    data_za$date <- as.Date(data_za$date, "%d-%m-%Y")
    
    # Round data so we have integer cases
    data_za[, c("EC", "FS", "GP", "KZN", "LP", "MP", "NC", "NW", "WC")] <-
      round(data_za[, c("EC", "FS", "GP", "KZN", "LP", "MP", "NC", "NW", "WC")] , 0)
    
    # Calculate a new SA column
    data_za$SA <-
      rowSums(data_za[, c("EC", "FS", "GP", "KZN", "LP", "MP", "NC", "NW", "WC")])
    
    # "Melt" the data
    data_za <-
      pivot_longer(
        data_za,
        cols = c("EC", "FS", "GP", "KZN", "LP", "MP", "NC", "NW", "WC", "SA"),
        names_to = "province",
        values_to = "count"
      )
    
    # Getting daily data from the cumulative data set
    data_za =  data_za %>%
      group_by(province) %>%
      arrange(date) %>%
      mutate(count = count - lag(count, default = 0)) %>%
      ungroup()

    # add missing dates
    all_dates <- expand_grid(date = seq(start_date, end_date, 1),
                             province = levels(as.factor(data_za$province)))
    
    # join
    data_za <- left_join(all_dates, data_za, by =c("date","province"))
    
    #province factor
    data_za$province <- as.factor(data_za$province)

    # 0 for NAs
    data_za$count <- ifelse(is.na(data_za$count), 0, data_za$count)
    
    # remove negatives
    data_za$count <- ifelse(data_za$count < 0, 0, data_za$count)
    
    # get cumulative counts
    data_za <- data_za %>%
      group_by(province) %>%
      mutate(cumulative_count = cumsum(count)) %>%
      ungroup()
    
    return(data_za)
  }
```

Below we use the function above to process deaths and cases and then combine them into a single dataset.

```{r apply_fix_data_za}
start_date <- min(as.Date(data_za_cases$date, "%d-%m-%Y"))
end_date <- max(as.Date(data_za_cases$date, "%d-%m-%Y"))
data_za_cases <-
  fix_data_za(data_za_cases, start_date = start_date, end_date = end_date)
data_za_deaths <-
  fix_data_za(data_za_deaths, start_date = start_date, end_date = end_date)

data_za_cases<-cbind("cases",data_za_cases)
data_za_deaths<-cbind("deaths",data_za_deaths)
colnames(data_za_cases)[1]<-"type"
colnames(data_za_deaths)[1]<-"type"

# combined
data_za <-
  rbind(data_za_cases,data_za_deaths)

# remove data sets no longer needed
rm("data_za_cases", "data_za_deaths", "start_date", "end_date", "fix_data_za")
```


## Other countries

### Download

Data for other countries are downloaded from [@ECDC2020].

```{r download_data, warning=FALSE, message=FALSE}
GET(
  url = "https://opendata.ecdc.europa.eu/covid19/casedistribution/csv",
  write_disk(
    "ecdc_casedistribution.csv",
    overwrite = TRUE
  )
)
```

### Data preperation

First read in the data from the downloaded comma-separated values text file.

```{r read_data}
# Read from CSVs above
data <-
  read.csv(
    "ecdc_casedistribution.csv",
    stringsAsFactors = FALSE
  )
```

Update data by doing the following:

1. Format dates.
2. Set column names.
3. Add data where dates were skipped (with 0 cases or deaths)
4. "Melt" the data down.
5. Set fields as factors as needed.
6. Get cumulative numbers.

```{r update_data}
# get date
data$date <- as.Date(data$dateRep, format = "%d/%m/%Y")

# get cases and deaths
data <-
  data %>% select(continentExp, date, geoId, countriesAndTerritories, cases, deaths)

# column names
colnames(data) <-
  c("continent", "date", "country_code", "country", "cases", "deaths")

# add 0 data for dates that were skipped
for (c in levels(as.factor(data$country))) {
  c_dates <- (data %>% filter(country == c))$date
  check_dates <- seq(min(c_dates), max(c_dates), by = 1)
  missing_dates <- check_dates[!check_dates %in% c_dates]
  continent <- (data %>% filter(country == c))$continent[1]
  country_code <- (data %>% filter(country == c))$country_code[1]
  if (length(missing_dates) > 0) {
    missing_data <- data.frame(
      continent = continent,
      date = missing_dates,
      country_code = country_code,
      country = rep(c, length(missing_dates)),
      cases = rep(0, length(missing_dates)),
      deaths = rep(0, length(missing_dates))
    )
    data <- rbind(data, missing_data)
  }
}

# "Melt" the data
data <-
  pivot_longer(
    data,
    cols = c("cases", "deaths"),
    names_to = "type",
    values_to = "count"
  )

# fields as factors
data$continent <- as.factor(data$continent)
data$country_code <- as.factor(data$country_code)
data$country <- as.factor(data$country)
data$type <- as.factor(data$type)

# make sure it is a data frame
data <- as.data.frame(data)

# zeroise counts below 0
data$count <-  ifelse(data$count<0, 0 , data$count)

#order and get group by
data <-  data %>%
  group_by(country_code, country, type) %>%
  arrange(country_code, country, type, date) %>%
  mutate(cumulative_count = cumsum(count)) %>%
  ungroup()

```

## Google Mobility Data

Google released mobility data indexes that deviations from a baseline of movement during 3 January to 6 February 2020 [@Google2020].  This data is downloaded and prepared for linking to our estimates of $R_{t,m}$.

### Download

The data is made available as a comma-separated file which is updated regularly but not daily.  Data is sometimes up to a week behind.

```{r download_google_mobility, warning=FALSE, message=FALSE}
GET(
  url = "https://www.gstatic.com/covid19/mobility/Global_Mobility_Report.csv",
  write_disk(
    "google_mobility_data.csv",
    overwrite = TRUE
  )
)
```

### Preparing the mobility data

First read in the file:

```{r read_google_mobility}
# Read from CSVs above
google_mobility_data <-
  read.csv(
    "google_mobility_data.csv",
    stringsAsFactors = FALSE,
    na.strings = ""
  )
```

Below the data is prepared by doing the following:

1. Fixing dates
2. Retaining only country level mobility data, except for South Africa where country and provincial data is retained.
3. Renaming the fields consistently.
4. Various transformations to make joining to other data easier later.

```{r prep_google_mobility}
# fix dates
google_mobility_data$date <-
  as.Date(google_mobility_data$date, format = "%Y-%m-%d")

# keep only country level data but keep all data for ZA
google_mobility_data <- google_mobility_data %>%
  filter(country_region_code == "ZA" | is.na(sub_region_1))

# pick only the fields required
google_mobility_data <- google_mobility_data %>%
  select(
    country_region_code,
    country_region,
    sub_region_1,
    iso_3166_2_code,
    date,
    retail_and_recreation_percent_change_from_baseline,
    grocery_and_pharmacy_percent_change_from_baseline,
    parks_percent_change_from_baseline,
    transit_stations_percent_change_from_baseline,
    workplaces_percent_change_from_baseline,
    residential_percent_change_from_baseline
  )

# rename the fields
colnames(google_mobility_data) <- c(
  "country_code",
  "country",
  "province",
  "province_code",
  "date",
  "retail_and_recreation",
  "grocery_and_pharmacy",
  "parks",
  "transit_stations",
  "workplaces",
  "residential"
)

# move province data to country for ease later
google_mobility_data$country_code[!is.na(google_mobility_data$province_code)] <-
  google_mobility_data$province_code[!is.na(google_mobility_data$province_code)]
google_mobility_data$country[!is.na(google_mobility_data$province_code)] <-
  google_mobility_data$province[!is.na(google_mobility_data$province_code)]

# rename KZN consistently
google_mobility_data$country_code[google_mobility_data$country_code == "ZA-NL"] <-
  "ZA-KZN"

# drop province columns
google_mobility_data <- google_mobility_data[, c(-3,-4)]

# divide indexes by 100 to have numbers between 0 and 1
google_mobility_data[, 4:9] <- google_mobility_data[, 4:9] / 100

# fix GB code to UK
google_mobility_data$country_code[google_mobility_data$country_code == "GB"] <-
  "UK"

# make factors
google_mobility_data$country_code <-
  as.factor(google_mobility_data$country_code)
google_mobility_data$country <-
  as.factor(google_mobility_data$country)
```

Further to the above we also calculate an average mobility index per [@Nouvellet2020].  This index averages the mobility indexes but exclude the following indexes:

* Residential index is excluded as it's less likely to drive the epidemic (and is inversely correlated),
* Mobility in parks is also unlikely to be a driving factor.

```{r calc_average_mobility}
google_mobility_data$average_mobility <- (
  google_mobility_data$retail_and_recreation +
    google_mobility_data$grocery_and_pharmacy +
    google_mobility_data$transit_stations +
    google_mobility_data$workplaces
) / 4
```

## Combine South African data with other data

To simplify further analysis it's easier to combine all data in a single dataset:

```{r combine_data}
# Add a "continent" field to data_za and combine
data_za <- cbind("SA", data_za)
colnames(data_za)[1] <- "continent"
colnames(data_za)[4] <- "country"
data_za$country_code <- paste0("ZA-", data_za$country)
data <- rbind(data, data_za)

# remove all objects except what we need
rm(list = ls()[!ls() %in% c("data", "country_codes", "google_mobility_data")])
```

# Basic Exploration

## South Africa

Below we plot cumulative case count on a log scale by province:

```{r plot_cases_za}
ggplot(
  data %>% filter(continent == "SA" &
                    type == "cases" & cumulative_count > 0),
  aes(x = date, y = cumulative_count)
) +
  geom_line(aes(color = country), size = 1) +
  scale_y_log10(labels = comma) +
  ggtitle("Cumulative Cases by Province") +
  xlab("Date") +
  ylab("Cumulative Cases") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        legend.position = "right") +
  guides(fill = guide_legend(ncol = 1)) +
  scale_color_hue(l = 50)
```

Below we plot the cumulative deaths by province on a log scale:

```{r plot_deaths_za}
ggplot(
  data %>% filter(continent == "SA" &
                    type == "deaths" & cumulative_count > 0),
  aes(x = date, y = cumulative_count)
) +
  geom_line(aes(color = country), size = 1) +
  scale_y_log10(labels = comma) +
  ggtitle("Cumulative Deaths by Province") +
  xlab("Date") +
  ylab("Cumulative Deaths") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        legend.position = "right") +
  guides(fill = guide_legend(ncol = 1)) +
  scale_color_hue(l = 50)
```

Below we plot average mobility indexes by province:

```{r plot_mobility_province}
ggplot(data = google_mobility_data[substr(as.character(google_mobility_data$country_code), 1, 2) ==
                                     "ZA", ],
       aes(x = date, y = average_mobility, colour = country)) +
  geom_line() + scale_color_hue(l = 50) +
  theme_bw()
```

## Other Countries

Below we plot cumulative case count on a log scale by country:

```{r plot_cases}
ggplot(
  data %>% filter(
    continent != "SA" &
      type == "cases" &
      country_code %in% country_codes &
      cumulative_count > 0
  ),
  aes(x = date, y = cumulative_count)
) +
  geom_line(aes(color = country), size = 1) +
  scale_y_log10(labels = comma) +
  ggtitle("Cumulative Cases by Country") +
  xlab("Date") +
  ylab("Cumulative Cases") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        legend.position = "right") +
  guides(fill = guide_legend(ncol = 1)) +
  scale_color_hue(l = 50)
```

Below we plot the cumulative deaths by country on a log scale:

```{r plot_deaths}
ggplot(
  data %>% filter(
    continent != "SA" &
      type == "deaths" &
      country_code %in% country_codes & cumulative_count > 0
  ),
  aes(x = date, y = cumulative_count)
) +
  geom_line(aes(color = country), size = 1) +
  scale_y_log10(labels = comma) +
  ggtitle("Cumulative Deaths by Country") +
  xlab("Date") +
  ylab("Cumulative Deaths") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        legend.position = "right") +
  guides(fill = guide_legend(ncol = 1)) +
  scale_color_hue(l = 50)
```

Below we plot average mobility indexes by country:

```{r plot_mobility}
ggplot(data = google_mobility_data %>% filter(country_code %in% country_codes),
       aes(x = date, y = average_mobility, colour = country)) +
  geom_line() + scale_color_hue(l = 50) +
  theme_bw()
```

# Method & Assumptions

This analysis follows the follows the method proposed [@Cori2013].

Essentially this models the following relationship:

$E(I_{t,m})=R_{t,m}\sum_{s=1}^tI_{t-s,m}w_{s}$ where:

* $m$ is the country of province in case of South African data.
* $I_{t,m}$ is the number of cases (or deaths) reported at time $t$.
* $R_{t,m}$ is the "instantaneous reproduction number" to quote [@Cori2013]. 
* $I_{t,m} \sim Poisson$ with the above mean.
* $w_{s}$ is the infectivity profile of an individual over time $s$.

The formulation of instantaneous rate of transmission proposed in [@Cori2013] is convenient because it ensures that $R_{t,m}$ only depends on information that is known at time $t$.  We do not need to know about future transmission.

## Serial Interval

To do further analysis an serial interval assumption is needed ($w_{s}$).  The serial interval is taken from [@Ferguson2020].  It's assumed to be Gamma distributed with mean of 6.48 and standard deviation of 3.83.  This corresponds to the effective infectiousness of an individual since acquiring the infection themselves.  

We plot this serial distribution below:

```{r serial_interval}
si_mean <- 6.48
si_sd <- 3.83
si_cv <- si_sd / si_mean

serial_interval = rep(0, 1)
serial_interval[1] = (pgammaAlt(1.5, si_mean, si_cv) - pgammaAlt(0, si_mean, si_cv))
for (i in 2:50) {
  serial_interval[i] = (pgammaAlt(i + .5, si_mean, si_cv) - pgammaAlt(i - .5, si_mean, si_cv))
}
ggplot(
  data.frame(days = seq(1, 20, 1), serial_interval = serial_interval[1:20]),
  aes(y = serial_interval, x = days)
) + geom_line(size = 1) +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        legend.position = "right") +
  guides(fill = guide_legend(ncol = 1)) +
  scale_color_hue(l = 50)

```

## Time Window Size

```{r set_tau}
# sliding window
tau <- 7
```

[@Cori2013] describes a choice of a time window to estimate $R_{t,m}$.  Based on this analysis we chose a time window of `r tau`-days ($\tau$ in using notation of [@Cori2013]).  During this window  $R_{t,m}$ is assumed to be constant.  However information prior that is taken into account.

Note that the time window size does not interact with the serial interval as information prior to the window is taken into account.  It's just the period during which $R_{t,m}$ is assumed to be constant.

Based on the table in Appendix 2 of [@Cori2013] we believe a window of `r tau`-days to be reasonable as long as that window contains 12 or more cases or deaths.

# Estimate $R_{t,m}$

## Estimation Routine

The following code works backwards and fits $R_{t,m}$ values for whole weeks (ending on the last date in the data) using the `EpiEstim` package. Using deaths or infection the as the cases in the puts the estimate of $R_{t,m}$ as at the date of the cases being tracked.

So $t$ is based on the date of reporting (be that cases or deaths) and $m$ is the country or province. Two values are estimated for each country:

1. $R_{t,m}^{cases}$ is the reproductive number implied by the cases reported at time $t$ in country $m$.
2. $R_{t,m}^{deaths}$ is the reproductive number implied by the deaths reported at time $t$ in country $m$.

Note that the time periods are left unadjusted, though in reality the $R_{t,m}^{deaths}$ should be shifted back approximately 2 weeks relative to $R_{t,m}^{cases}$.

```{r estimate_rt}
Rt_data <- NULL
for (c in levels(data$country)) {
  for (t in levels(data$type)) {
    # filter out country data of type t
    c_data <- data %>% filter(country == c & type == t)
    
    # vector of count of cases/deaths
    I <- c_data$count
    
    # t=1 corresponds to this date:
    t1_date <- min(c_data$date)
    
    # only continue if c_data$cumulative_count > 12
    if (max(c_data$cumulative_count) >= 12) {
      # the day after the first case/death:
      t_start_date <-
        min((c_data %>% filter(cumulative_count >= 12))$date) + 1
      t_start <-
        min(seq(1, length(I))[c_data$cumulative_count >= 12]) + 1
      
      # last day of cases/deaths
      t_end <- length(I)
      t_end_date <-  t1_date + (t_end - 1)
      
      # how many full weeks do we have
      full_weeks <- floor((t_end - t_start) / tau)
      
      # only continue if we have 1 full week or more
      if (full_weeks > 0) {
        # then divide period into weeks
        T.Start <-
          seq(t_end - tau * full_weeks, t_end - tau, by = tau)
        T.End <- T.Start + tau
        
        # estimate $R_{t,m}^{type}$ for each week:
        c_Rt <- estimate_R(
          incid = I,
          config = make_config(
            t_start = T.Start,
            t_end = T.End,
            mean_si = si_mean,
            std_si = si_sd
          ),
          method = "parametric_si"
        )$R
        
        # count cases
        c_data$week <-
          floor(as.numeric(c_data$date - (t1_date + T.Start[1])) / tau)
        c_data$dd <-
          c(0, c_data$date[2:nrow(c_data)] - c_data$date[1:(nrow(c_data) - 1)])
        c_agg_data <- c_data %>%
          filter(week >= 0) %>%
          group_by(week) %>%
          summarise(count = sum(count), .groups = "drop")
        
        # add country and type designations
        c_Rt <-
          cbind(c_data$continent[1],
                c_data$country_code[1],
                c,
                t,
                c_Rt,
                c_agg_data$count)
        
        # and add dates
        c_Rt$date_start <- t1_date + c_Rt$t_start
        c_Rt$date_end <- t1_date + c_Rt$t_end - 1
        
        # combine the results
        if (is.null(Rt_data)) {
          Rt_data <- c_Rt
        } else {
          Rt_data <- rbind(Rt_data, c_Rt)
        }
      }
    }
  }
}
```

## Prepping data for tabulation and plots

Below we prep column headings and prepare CI data:

```{r prep_rt_data}
# Column names
colnames(Rt_data) <- c(
  "continent",
  "country_code",
  "country",
  "type",
  "t_start",
  "t_end",
  "Rt_mean",
  "Rt_std",
  "Rt_li_95",
  "Rt_li_90",
  "Rt_li_50",
  "Rt_median",
  "Rt_ui_50",
  "Rt_ui_90",
  "Rt_ui_95",
  "count",
  "date_start",
  "date_end"
)

# mid week data point
Rt_data$date_mid <-
  Rt_data$date_start + (Rt_data$date_end - Rt_data$date_start) / 2

# split out CI data into different dataset
Rt_ci_data <- NULL
for (ci in c("50", "90", "95")) {
  r <- data.frame(
    country_code = Rt_data$country_code,
    country = Rt_data$country,
    type = Rt_data$type,
    ci = rep(paste0(ci, "%"), nrow(Rt_data)),
    date_mid = Rt_data$date_mid,
    Rt_mean = Rt_data$Rt_mean,
    Rt_li = Rt_data[[paste0("Rt_li_", ci)]],
    Rt_ui = Rt_data[[paste0("Rt_ui_", ci)]]
  )
  Rt_ci_data <- rbind(r, Rt_ci_data)
}

# Factors
Rt_data$type <- as.factor(Rt_data$type)
Rt_data$country <- as.factor(Rt_data$country)

# remove what is not needed
rm(list = ls()[!ls() %in% c("data",
                            "country_codes",
                            "Rt_data",
                            "Rt_ci_data",
                            "google_mobility_data",
                            "tau")])
```

# Linking $R_{t,m}$ estimates to mobility

## Preparing `Rt_data`

Linking the mobility data should be done in a way that corresponds to the time the infection should have occurred not the time of death.  Below new date ranges are calculated to correspond roughly to the time of infection:

```{r calc_dates_infection}
Rt_data$date_inf_start <- Rt_data$date_start - ifelse(Rt_data$type=="death",23+7,6+7)
Rt_data$date_inf_end <- Rt_data$date_end - ifelse(Rt_data$type=="death",23+7,6+7)
Rt_data$date_inf_mid <- Rt_data$date_mid - ifelse(Rt_data$type=="death",23+7,6+7)
```


## Calculating weekly mobility data

The code below calculates average weekly mobility data averaged over the same date ranges as calculated above.

```{r calc_weekly_mobility}
weekly_mobility_data <- NULL
combined_country_codes <-
  levels(Rt_data$country_code)[levels(Rt_data$country_code) %in% levels(google_mobility_data$country_code)]
for (c in combined_country_codes) {
  for (t in levels(Rt_data$type)) {
    c_data <- Rt_data %>% filter(country_code == c & type == t)
    if (nrow(c_data) > 0) {
      for (i in 1:nrow(c_data)) {
        w_data <- google_mobility_data %>% filter(country_code == c &
                                                    date >= c_data$date_inf_start[i] &
                                                    date <= c_data$date_inf_end[i]) %>%
          group_by(country_code) %>%
          summarise(
            retail_and_recreation = mean(retail_and_recreation),
            grocery_and_pharmacy = mean(grocery_and_pharmacy),
            parks = mean(parks),
            transit_stations = mean(transit_stations),
            workplaces = mean(workplaces),
            residential = mean(residential),
            average_mobility = mean(average_mobility),
            .groups = "drop"
          )
        if (nrow(w_data) > 0) {
          if (!is.na(w_data$country_code)) {
            w_data$type <- t
            w_data$date_inf_mid <- c_data$date_inf_mid[i]
            weekly_mobility_data <-
              rbind(weekly_mobility_data, w_data)
          }
        }
      }
    }
  }
}
```

## Joining with $R_{t,m}$ estimates

Based on work above the weekly mobility data can be joined:
```{r join_mobility}
Rt_data <-
  left_join(Rt_data,
            weekly_mobility_data,
            by = c("country_code", "type", "date_inf_mid"))
```

# Modelling $R_{t,m}$ as a function of Mobility Indexes

**TO DO**

# Results

## Current $R_{t,m}$ estimates

Below current (last weekly) $R_{t,m}$ estimates are tabulated.  To so the data first needs to be prepared:

```{r prep_table}
# find the last estimates
last_dates <- Rt_data %>%
  group_by(country, type) %>%
  summarise(date_mid = max(date_mid), .groups = "drop")

# construct a table with nice fields names
table <-
  inner_join(last_dates, Rt_data, by = c("country", "type", "date_mid"))
table <- table %>%
  select(continent,
         country_code,
         country,
         type,
         count,
         date_end,
         Rt_li_95,
         Rt_mean,
         Rt_ui_95)
```

### South Africa

```{r tabulate_za}
knitr::kable((table %>% filter(continent == "SA"))[, c(-1,-2)],
             format.args = list(big.mark = ','),
             digits = 1,
             col.names =  c(
               "Country",
               "Estimated Type",
               "Count (Last Week)",
               "Week Ending",
               "R - Lower CI",
               "R - Mean",
               "R - Uppper CI"
             )
)
```

### Other Countries

The table below contains $R_{t,m}$ estimates for the last week available.  These estimates are based either on cases or deaths and a 95% confidence interval is shown.

```{r tabulate}
knitr::kable((table %>% filter(continent != "SA" & country_code %in% country_codes))[, c(-1,-2)],
             format.args = list(big.mark = ','),
             digits = 1,
             col.names =  c(
               "Country",
               "Estimated Type",
               "Count (Last Week)",
               "Week Ending",
               "R - Lower CI",
               "R - Mean",
               "R - Uppper CI"
             )
)
```

### Top 10 Countries

Below we show various extremes of $R_{t,m}$ where counts (deaths or cases) exceed 50 in the last week.

#### Lowest $R_{t,m}$ based on Deaths

```{r tabulate_low_deaths}
knitr::kable((table %>% filter(continent != "SA" & type=="deaths" & count>50) %>%
                arrange(Rt_mean))[1:10, c(-1,-2)],
             format.args = list(big.mark = ','),
             digits = 1,
             col.names =  c(
               "Country",
               "Estimated Type",
               "Count (Last Week)",
               "Week Ending",
               "R - Lower CI",
               "R - Mean",
               "R - Uppper CI"
             )
)
```

#### Lowest $R_{t,m}$ based on Cases

```{r tabulate_low_cases}
knitr::kable((table %>% filter(continent != "SA" & type=="cases" & count>50) %>%
                arrange(Rt_mean))[1:10, c(-1,-2)],
             format.args = list(big.mark = ','),
             digits = 1,
             col.names =  c(
               "Country",
               "Estimated Type",
               "Count (Last Week)",
               "Week Ending",
               "R - Lower CI",
               "R - Mean",
               "R - Uppper CI"
             )
)
```
#### Highest $R_{t,m}$ based on Deaths

```{r tabulate_high_deaths}
knitr::kable((table %>% filter(continent != "SA" & type=="deaths" & count>50) %>%
                arrange(-Rt_mean))[1:10, c(-1,-2)],
             format.args = list(big.mark = ','),
             digits = 1,
             col.names =  c(
               "Country",
               "Estimated Type",
               "Count (Last Week)",
               "Week Ending",
               "R - Lower CI",
               "R - Mean",
               "R - Uppper CI"
             )
)
```

#### Highest $R_{t,m}$ based on Cases

```{r tabulate_high_cases}
knitr::kable((table %>% filter(continent != "SA" & type=="cases" & count>50) %>%
                arrange(-Rt_mean))[1:10, c(-1,-2)],
             format.args = list(big.mark = ','),
             digits = 1,
             col.names =  c(
               "Country",
               "Estimated Type",
               "Count (Last Week)",
               "Week Ending",
               "R - Lower CI",
               "R - Mean",
               "R - Uppper CI"
             )
)
```

## Result Plots

Let's generate our plots for each country/province in a list.  We filter out weeks where the upper end of confidence interval for $R_{t,m}$ exceeds five.

```{r result_plots}
country_plots <- list()
for (c in levels(Rt_data$country)) {
  p1 <- ggplot(Rt_ci_data %>% filter(country == c &
                                       ci == "95%" &
                                       Rt_ui <= 5),
               aes(x = date_mid, y = Rt_mean)) +
    geom_crossbar(
      position = position_dodge2(padding = 0),
      aes(
        ymin = Rt_li,
        ymax = Rt_ui,
        colour = type,
        fill = type
      ),
      width = tau
    ) +
    scale_fill_manual(values = c(alpha("deepskyblue4", 0.45),
                                 alpha("#8b0068", 0.45))) +
    scale_colour_manual(values = c(alpha("deepskyblue4"),
                                   alpha("#8b0068"))) +
    scale_y_continuous(limits = c(0, 5)) +
    scale_x_date(date_breaks = "1 months", labels = date_format("%e %b")) +
    theme_bw() +
    theme(axis.text.x = element_text(angle = 45, hjust = 1),
          legend.position = "right") +
    guides(fill = guide_legend(ncol = 1)) +
    xlab("Week") +
    ylab(expression(R[t, m]))  +
    ggtitle(c) + geom_hline(yintercept = 1)
  country_plots[[c]] <- p1
}
```


### South Africa

Below we plot estimates for all the provinces.  Note that [@Cori2013] recommends only to start using estimates after at least 12 cases have occurred.  On that basis we do not have estimates for R based on deaths for a number of provinces.

```{r plot_r_provinces, fig.height=12, fig.width=12}
ggarrange(
  country_plots[["EC"]],
  country_plots[["FS"]],
  country_plots[["GP"]],
  country_plots[["KZN"]],
  country_plots[["LP"]],
  country_plots[["MP"]],
  country_plots[["NC"]],
  country_plots[["NW"]],
  country_plots[["WC"]],
  country_plots[["SA"]],
  ncol = 2,
  nrow = 5
) +
  ggtitle("Reproductive number by province, with 95% confidence intervals")
```

### Other Countries

Below we plot all the countries:

```{r plot_r_countries, fig.height=12, fig.width=12}
ggarrange(
  country_plots[["Brazil"]],
  country_plots[["Canada"]],
  country_plots[["Chile"]],
  country_plots[["India"]],
  country_plots[["Ireland"]],
  country_plots[["Italy"]],
  country_plots[["Peru"]],
  country_plots[["South_Africa"]],
  country_plots[["Spain"]],
  country_plots[["United_Kingdom"]],
  ncol = 2,
  nrow = 5
) +
  ggtitle("Reproductive number by country, with 95% confidence intervals")
```

### Scatter Plots of $R_{t,m}^{cases}$ vs. $R_{t,m}^{deaths}$

Below we plot estimates of $R_{t,m}$ based on cases versus those based on deaths for countries where these estimates range between 0.5 and 2.

#### South Africa

Below we plot where there were more than 5 deaths and cases in the last week.  It appears that, generally, $R_{t,m}^{cases}$ and $R_{t,m}^{deaths}$ are consistent.

```{r scatter_plot_za}
sp_data <- pivot_wider(
  table %>% filter(count>=5 & continent == "SA"), 
  id_cols=c("continent","country"),
  names_from = "type",
  values_from = "Rt_mean"
)

ggplot(
  sp_data,
  aes(x = cases, y = deaths)) +
  geom_point(aes(label=country)) +
  geom_text(aes(label=country),hjust=0, vjust=0, size=2) +
  ggtitle("Reproductive number by country") +
  xlab(expression(R[t,m]^cases)) +
  ylab(expression(R[t,m]^deaths)) +
  scale_y_continuous(limits = c(0.5, 3)) +
  scale_x_continuous(limits = c(0.5, 3)) +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        legend.position = "right") +
  guides(fill = guide_legend(ncol = 1)) +
    scale_color_hue(l = 50) +
  geom_abline(a=0,b=1)
```

#### Other countries

Below we plot where there were more than 25 deaths and cases in the last week.  It appears that, generally, $R_{t,m}^{cases}$ and $R_{t,m}^{deaths}$ are consistent.

```{r scatter_plot}
sp_data <- pivot_wider(
  table %>% filter(count>=25 & continent != "SA"), 
  id_cols=c("continent","country"),
  names_from = "type",
  values_from = "Rt_mean"
)

ggplot(
  sp_data,
  aes(x = cases, y = deaths)) +
  geom_point(aes(color = continent, label=country)) +
  geom_text(aes(label=country),hjust=0, vjust=0, size=2) +
  ggtitle("Reproductive number by country") +
  xlab(expression(R[t,m]^cases)) +
  ylab(expression(R[t,m]^deaths)) +
  scale_y_continuous(limits = c(0.5, 2)) +
  scale_x_continuous(limits = c(0.5, 2)) +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        legend.position = "right") +
  guides(fill = guide_legend(ncol = 1)) +
    scale_color_hue(l = 50) +
  geom_abline(a=0,b=1)
```

## Mobility

### Simple plot

Below we plot weeks of estimates of $R_{t,m}$ versus the average mobility index that was constructed. There is a clear link between mobility and the reproductive number.  As it decreases $R_{t,m}$ reduces:

```{r plot_mobility_vs_Rt_deaths}
ggplot(
  Rt_data %>% filter(type=="cases" & count>100 & 
                       country_code %in% c(country_codes)),
  aes(x = average_mobility, y = Rt_mean)) +
  geom_point(aes(colour=country)) +
  ggtitle("Reproductive number by mobility") +
  ylab(expression(R[t,m]^cases)) +
  #scale_y_continuous(limits = c(0, 10)) +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        legend.position = "right") +
  guides(fill = guide_legend(ncol = 1)) +
    scale_color_hue(l = 50)
```

### Modelling Results

***TO DO***

# Discussion

From the basic plots it's clear that South Africa and a few other countries appear to be on a different "slope" than European countries show.  These include Brazil, Peru, Chile and India.

The above shows estimates for reproductive number using cases and deaths ($R_{t,m}^{cases}$ and $R_{t,m}^{deaths}$) for each country $m$ over time $t$ in weeks.

From the current estimates it appears that, at present, in general South African provincial estimates based on cases and deaths correspond.  An exception to this may be Eastern Cape with estimates based on deaths exceeding those based on cases.

It is also clear that estimates based on cases appear to precede the movements based on deaths over time.  This can be expected as, all things being equal, infections and associated cases should be a precursor to deaths.

South Africa does not compare well, and appears to have one of the highest $R_{t,m}^{deaths}$ in the world.  This is higher than the $R_{t,m}^{cases}$ which may be indicative of lower testing and/or backlogs with regard to testing.

On a scatter plot of countries most appear to have $R_{t,m}^{cases}$ correlated well with $R_{t,m}^{deaths}$.  Chile and Afghanistan appear to be outliers with $R_{t,m}^{deaths}$ higher than $R_{t,m}^{cases}$ indicating perhaps limited testing, or alternatively epidemics that are slowing (given the leading nature of cases vs. deaths).

Overall it's clear that having multiple measures of $R_{t,m}$ appears useful and monitoring it's value is something useful.

We also show that mobility is clearly correlated with $R_{t,m}^{cases}$ in the countries shown.  

Limitation of this method to estimate $R_{t,m}$ are noted in [@Cori2013]

* It's sensitive to changes in transmissibility, changes in contact patterns, depletion of the susceptible population and control measures.
* It relies on an assumed serial interval assumptions.
* The size of the time window can affect the volatility of results.
* Results are time lagged with regards to true infection, more so in the case of the use of deaths.
* It's sensitive to changes in case (or death) detection.
* The serial interval may change over time.

Despite these limitation we believe the ease of calculation of this method and the ability to use multiple sources makes it useful as a monitoring tool.

# Detailed Output

Detailed output for all countries are saved to a comma-separated value file below.  The file [can be found here](Rt_data.csv).

```{r write_csv}
write.csv(Rt_data,"Rt_data.csv",row.names = FALSE)
```

# Author

This report was prepared by Louis Rossouw.  Please get in contact with Louis Rossouw if you have comments or wish to receive this regularly.

**Louis Rossouw**  
Head of Research & Analytics  
Gen Re | Life/Health Canada, South Africa, Australia, NZ, UK & Ireland  
Email: [LRossouw@GenRe.com](mailto:LRossouw@genre.com) Mobile: +27 71 355 2550

***The views in this document represents that of the author and may not represent those of Gen Re.  Also note that given the significant uncertainty involved with the parameters, data and methodology care should be taken with these numbers and any use of these numbers.***

# References
